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Recent  Advances  in 
Augmented  Reality 

M  JR  #hat  is  augmented  reality?  An  AR  system 
VV  supplements  the  real  world  with  virtual 
(computer-generated)  objects  that  appear  to  coexist  in 
the  same  space  as  the  real  world.  While  many  researchers 
broaden  the  definition  of  AR  beyond  this  vision,  we  define 
an  AR  system  to  have  the  following  properties: 


■  combines  real  and  virtual  objects  in  a  real  environment; 

■  runs  interactively,  and  in  real  time;  and 

■  registers  (aligns)  real  and  virtual  objects  with  each 
other. 


In  1997,  Azuma  published  a 
survey  on  augmented  reality 
(AR).  Here  we  provide  a 
complement  to  that  survey, 
denoting  the  rapid 
technological  advancements 


since  then. 


Note  that  we  don’t  restrict  this 
definition  of  AR  to  particular  dis¬ 
play  technologies,  such  as  a  head- 
mounted  display  (HMD) .  Nor  do  we 
limit  it  to  our  sense  of  sight.  AR  can 
potentially  apply  to  all  senses,  inclu¬ 
ding  hearing,  touch,  and  smell.  Cer¬ 
tain  AR  applications  also  require 
removing  real  objects  from  the  per¬ 
ceived  environment,  in  addition  to 
adding  virtual  objects.  For  example, 
an  AR  visualization  of  a  building  that 
stood  at  a  certain  location  might 
remove  the  building  that  exists  there 
today.  Some  researchers  call  the  task 
of  removing  real  objects  mediated  or 
diminished  reality,  but  we  consider  it  a  subset  of  AR. 

Milgram1  defined  a  continuum  of  real-to-virtual  envi¬ 
ronments,  in  which  AR  is  one  part  of  the  general  area  of 
mixed  reality  (Figure  1) .  In  both  augmented  virtuality ,  in 
which  real  objects  are  added  to  virtual  ones,  and  virtu¬ 
al  environments  (or  virtual  reality),  the  surrounding 
environment  is  virtual,  while  in  AR  the  surrounding 
environment  is  real.  We  focus  on  AR  and  don’t  cover 
augmented  virtuality  or  virtual  environments. 
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The  beginnings  of  AR,  as  we  define  it,  date  back  to 
Sutherland’s  work  in  the  1960s,  which  used  a  see- 
through  HMD  to  present  3D  graphics.2  However,  only 
over  the  past  decade  has  there  been  enough  work  to  refer 
to  AR  as  a  research  field.  In  1997,  Azuma  published  a  sur¬ 
vey3  that  defined  the  field,  described  many  problems, 
and  summarized  the  developments  up  to  that  point. 
Since  then,  AR’s  growth  and  progress  have  been  remark¬ 
able.  In  the  late  1990s,  several  conferences  on  AR  began, 
including  the  International  Workshop  and  Symposium 
on  Augmented  Reality,  the  International  Symposium  on 
Mixed  Reality,  and  the  Designing  Augmented  Reality 
Environments  workshop.  Some  well-funded  organiza¬ 
tions  formed  that  focused  on  AR,  notably  the  Mixed  Real¬ 
ity  Systems  Lab  in  Japan  and  the  Arvika  consortium  in 
Germany.  A  software  toolkit  (the  ARToolkit)  for  rapidly 
building  AR  applications  is  now  freely  available  at  http:// 
www.hitl.washington.edu/research/shared_space/.  Be¬ 
cause  of  the  wealth  of  new  developments,  this  field  needs 
an  updated  survey  to  guide  and  encourage  further  re¬ 
search  in  this  exciting  area. 

Our  goal  here  is  to  complement,  rather  than  replace, 
the  original  survey  by  presenting  representative  examples 
of  the  new  advances.  We  refer  you  to  the  original  survey3 
for  descriptions  of  potential  applications  (such  as  medical 
visualization,  maintenance  and  repair  of  complex  equip¬ 
ment,  annotation,  and  path  planning);  summaries  of  AR 
system  characteristics  (such  as  the  advantages  and  dis¬ 
advantages  of  optical  and  video  approaches  to  blending 
virtual  and  real,  problems  in  display  focus  and  contrast, 
and  system  portability);  and  an  introduction  to  the  cru¬ 
cial  problem  of  registration,  including  sources  of  regis¬ 
tration  error  and  error-reduction  strategies. 

Enabling  technologies 

One  category  for  new  developments  is  enabling  tech¬ 
nologies.  Enabling  technologies  are  advances  in  the  basic 
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technologies  needed  to  build  com¬ 
pelling  AR  environments.  Examples 
of  these  technologies  include  dis¬ 
plays,  tracking,  registration,  and 
calibration. 

Displays 

We  can  classify  displays  for  viewing  the  merged  virtu¬ 
al  and  real  environments  into  the  following  categories: 
head  worn,  handheld,  and  projective. 

Head-worn  displays  (HWD).  Users  mount  this 
type  of  display  on  their  heads,  providing  imagery  in 
front  of  their  eyes.  Two  types  of  HWDs  exist:  optical  see- 
through  and  video  see-through  (Figure  2) .  The  latter 
uses  video  capture  from  head-worn  video  cameras  as  a 
background  for  the  AR  overlay,  shown  on  an  opaque  dis¬ 
play,  whereas  the  optical  see-through  method  provides 
the  AR  overlay  through  a  transparent  display. 

Established  electronics  and  optical  companies  (for 
example,  Sony  and  Olympus)  have  manufactured  color, 
liquid  crystal  display  (LCD) -based  consumer  head-worn 
displays  intended  for  watching  videos  and  playing  video 
games.  While  these  systems  have  relatively  low  resolu¬ 
tion  (180,000  to  240,000  pixels),  small  fields  of  view 
(approximately  30  degrees  horizontal),  and  don’t  sup¬ 
port  stereo,  they’re  relatively  lightweight  (under  120 
grams)  and  offer  an  inexpensive  option  for  video  see- 
through  research.  Sony  introduced  800  x  600  resolution, 
stereo,  color  optical  see-through  displays  (later  discon¬ 
tinued)  that  have  been  used  extensively  in  AR  research. 

A  different  approach  is  the  virtual  retinal  display ,4 
which  forms  images  directly  on  the  retina.  These  dis¬ 
plays,  which  MicroVision  is  developing  commercially, 
literally  draw  on  the  retina  with  low-power  lasers  whose 
modulated  beams  are  scanned  by  microelectro¬ 
mechanical  mirror  assemblies  that  sweep  the  beam  hor¬ 
izontally  and  vertically.  Potential  advantages  include 
high  brightness  and  contrast,  low  power  consumption, 
and  large  depth  of  field. 

Ideally,  head-worn  AR  displays  would  be  no  larger 
than  a  pair  of  sunglasses.  Several  companies  are  devel¬ 
oping  displays  that  embed  display  optics  within  con¬ 
ventional  eyeglasses.  MicroOptical  produced  a  family  of 
eyeglass  displays  in  which  two  right-angle  prisms  are 
embedded  in  a  regular  prescription  eyeglass  lens  and 
reflect  the  image  of  a  small  color  display,  mounted  facing 
forward  on  an  eyeglass  temple  piece.5  The  intention  of 
the  Minolta  prototype  forgettable  display  is  to  be  light 
and  inconspicuous  enough  that  users  forget  that  they’re 
wearing  it.6  Others  see  only  a  transparent  lens,  with  no 
indication  that  the  display  is  on,  and  the  display  adds  less 
than  6  grams  to  the  weight  of  the  eyeglasses  (Figure  3) . 

Handheld  displays.  Some  AR  systems  use  hand¬ 
held,  flat-panel  LCD  displays  that  use  an  attached  cam¬ 
era  to  provide  video  see- through-based  augmentations.7 
The  handheld  display  acts  as  a  window  or  a  magnifying 
glass  that  shows  the  real  objects  with  an  AR  overlay. 

Projection  displays.  In  this  approach,  the  desired 
virtual  information  is  projected  directly  on  the  physical 
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objects  to  be  augmented.  In  the  simplest  case,  the  inten¬ 
tion  is  for  the  augmentations  to  be  coplanar  with  the 
surface  onto  which  they  project  and  to  project  them 
from  a  single  room-mounted  projector,  with  no  need  for 
special  eyewear.  Generalizing  on  the  concept  of  a  multi- 
walled  Cave  automatic  virtual  environment  (CAVE), 
Raskar  and  colleagues8  show  how  multiple  overlapping 
projectors  can  cover  large  irregular  surfaces  using  an 
automated  calibration  procedure  that  takes  into  account 
surface  geometry  and  image  overlap. 

Another  approach  for  projective  AR  relies  on  head- 
worn  projectors,  whose  images  are  projected  along  the 
viewer’s  line  of  sight  at  objects  in  the  world.  The  target 
objects  are  coated  with  a  retroreflective  material  that 
reflects  light  back  along  the  angle  of  incidence.  Multi¬ 
ple  users  can  see  different  images  on  the  same  target 
projected  by  their  own  head-worn  systems,  since  the 
projected  images  can’t  be  seen  except  along  the  line  of 
projection.  By  using  relatively  low  output  projectors, 
nonretroreflective  real  objects  can  obscure  virtual 
objects.  However,  projectors  worn  on  the  head  can  be 
heavy.  Figure  4  (next  page)  shows  a  new,  relatively  light¬ 
weight  prototype  that  weighs  less  than  700  grams.9 

One  interesting  application  of  projection  systems  is 
in  mediated  reality.  Coating  a  haptic  input  device  with 
retroreflective  material  and  projecting  a  model  of  the 
scene  without  the  device  camouflages  the  device  by 
making  it  appear  semitransparent10  (Figure  5,  next 
page).  Other  applications  using  projection  displays 
include  a  remote  laser  pointer  control11  and  a  simula¬ 
tion  of  a  virtual  optical  bench.12 

Problem  areas  in  AR  displays.  See-through  dis¬ 
plays  don’t  have  sufficient  brightness,  resolution,  field  of 
view,  and  contrast  to  seamlessly  blend  a  wide  range  of 
real  and  virtual  imagery.  Furthermore  size,  weight,  and 
cost  are  still  problems.  However,  there  have  been 
advances  on  specific  problems.  First,  in  conventional 
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optical  see-through  displays,  virtual  objects  can’t  com¬ 
pletely  occlude  real  ones.  One  experimental  display 
addresses  this  by  interposing  an  LCD  panel  between  the 
optical  combiner  and  the  real  world,  blocking  the  real- 
world  view  at  selected  pixels13  (Figure  6) . 


4  Experimental 
head-worn 
projective  dis¬ 
play  using 
lightweight 
optics.  (Cour¬ 
tesy  of  Jannick 
Rolland,  Univer¬ 
sity  of  Central 
Florida,  and 
Frank  Biocca, 
Michigan  State 
University.) 


5  Projection 
display  used  to 
camouflage  a 
haptic  input 
device.  The 
haptic  input 
device  normally 
doesn't  reflect 
projected 
graphics  (top). 
The  haptic  input 
device  coated 
with  retro- 
reflective  mate¬ 
rial  appears 
transparent 
(bottom). 


6  Images  photographed  through 
optical  see-through  display  sup¬ 
porting  occlusion,  (a)  Transparent 
overlay,  (b)  Transparent  overlay 
rendered  taking  into  account  real- 
world  depth  map.  (c)  LCD  panel 
opacifies  areas  to  be  occluded. 

(d)  Opaque  overlay  created  by 
opacified  pixels.  (Courtesy  of 
Kiyoshi  Kiyokawa,  Communications 
Research  Lab.) 


Second,  most  video  see-through  displays  have  a  par¬ 
allax  error,  caused  by  the  cameras  being  mounted  away 
from  the  true  eye  location.  If  you  see  the  real  world 
through  cameras  mounted  on  top  of  your  head,  your 
view  is  significantly  different  from  what  you’re  normal¬ 
ly  used  to,  making  it  difficult  to  adapt  to  the  display.14 
The  MR  Systems  Lab  developed  a  relatively  lightweight 
(340  grams)  video  graphics  array  (VGA)  resolution  video 
see-through  display,  with  a  51 -degree  horizontal  field  of 
view,  in  which  the  imaging  system  and  display  system 
optical  axes  are  aligned  for  each  eye.15  Finally,  most  dis¬ 
plays  have  fixed  eye  accommodation  (focusing  the  eyes 
at  a  particular  distance) .  Some  prototype  video  and  opti¬ 
cal  see-through  displays  can  selectively  set  accommo¬ 
dation  to  correspond  to  vergence  by  moving  the  display 
screen  or  a  lens  through  which  it’s  imaged.  One  version 
can  cover  a  range  of  .25  meters  to  infinity  in  .3  seconds.16 

New  tracking  sensors  and  approaches 

Accurately  tracking  the  user’s  viewing  orientation  and 
position  is  crucial  for  AR  registration.  Rolland  et  al.17 
give  a  recent  overview  of  tracking  systems.  For  prepared, 
indoor  environments,  several  systems  demonstrate 
excellent  registration.  Typically  such  systems  employ 
hybrid-tracking  techniques  (such  as  magnetic  and  video 
sensors)  to  exploit  strengths  and  compensate  weak¬ 
nesses  of  individual  tracking  technologies.  A  system 
combining  accelerometers  and  video  tracking  demon¬ 
strates  accurate  registration  even  during  rapid  head 
motion.18  The  Single  Constraint  at  a  Time  (Scaat)  algo¬ 
rithm  also  improved  the  tracking  performance.  Scaat 
incorporates  individual  measurements  at  the  exact  time 
they  occur,  resulting  in  faster  update  rates,  more  accu¬ 
rate  solutions,  and  autocalibrated  parameters.19  Two 
new  scalable  tracking  systems,  Inters ense’s  Constella¬ 
tion20  and  3rdTech’s  HiBall,21  can  cover  the  large  indoor 
environments  needed  by  some  AR  applications. 

Visual  tracking  generally  relies  on  modifying  the  envi¬ 
ronment  with  fiducial  markers  placed  in  the  environment 
at  known  locations.  The  markers  can  vary  in  size  to 
improve  tracking  range,22  and  the  computer- vision  tech¬ 
niques  that  track  by  using  fiducials  can  update  at  30  Hz.23 

While  some  recent  AR  systems  demonstrate  robust  and 
compelling  registration  in  prepared,  indoor  environ¬ 
ments,  there’s  still  much  to  do  in  tracking  and  calibration. 
Ongoing  research  includes  sensing  the  entire  environ¬ 
ment,  operating  in  unprepared  environments,  minimiz¬ 
ing  latency,  and  reducing  calibration  requirements. 

Environment  sensing.  Effective  AR  requires 
knowledge  of  the  user’s  location  and  the  position  of  all 
other  objects  of  interest  in  the  environment.  For  example, 
it  needs  a  depth  map  of  the  real  scene  to  support  occlu¬ 
sion  when  rendering.  One  system  demonstrates  real¬ 
time  depth-map  extraction  using  several  cameras,  where 
the  depth  map  is  reprojected  to  a  new  viewing  location.24 
Kanade’s  3D  dome  drives  this  concept  to  its  extreme  with 
49  cameras  that  capture  a  scene  for  later  virtual  replay.25 

Outdoor,  unprepared  environments.  In  out¬ 
door  and  mobile  AR  applications,  it  generally  isn’t  prac¬ 
tical  to  cover  the  environment  with  markers.  A  hybrid 
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compass/gyroscope  tracker  provides  motion-stabilized 
orientation  measurements  at  several  outdoor  locations26 
(Figure  7) .  With  the  addition  of  video  tracking  (not  in 
real  time),  the  system  produces  nearly  pixel-accurate 
results  on  known  landmark  features. 27,28  The  Townwear 
system29  uses  custom  packaged  fiber-optic  gyroscopes 
for  high  accuracy  and  low  drift  rates.30  Either  the  Glob¬ 
al  Positioning  System  (GPS)  or  dead  reckoning  tech¬ 
niques  usually  track  the  real-time  position  outdoors, 
although  both  have  significant  limitations  (for  example, 
GPS  requires  a  clear  view  of  the  sky) . 

Ultimately,  tracking  in  unprepared  environments 
may  rely  heavily  on  tracking  visible  natural  features 
(such  as  objects  that  already  exist  in  the  environment, 
without  modification).31  If  a  database  of  the  environ¬ 
ment  is  available,  we  can  base  tracking  on  the  visible 
horizon  silhouette32  or  rendered  predicted  views  of  the 
surrounding  buildings,  which  we  then  match  against 
the  video.33  Alternately,  given  a  limited  set  of  known 
features,  a  tracking  system  can  automatically  select 
and  measure  new  natural  features  in  the  environ¬ 
ment.34  There’s  a  significant  amount  of  research  on 
recovering  camera  motion  given  a  video  sequence  with 
no  tracking  information.  Today,  those  approaches 
don’t  run  in  real  time  and  are  best  suited  for  special 
effects  and  postproduction.  However,  these  algorithms 
can  potentially  apply  to  AR  if  they  can  run  in  real  time 
and  operate  causally  (without  using  knowledge  of 
what  occurs  in  the  future) .  In  one  such  example,35  a 
tracking  system  employs  planar  features,  indicated  by 
the  user,  to  track  the  user’s  change  in  orientation  and 
position. 

Low  latency.  System  delays  are  often  the  largest 
source  of  registration  errors.  Predicting  motion  is  one 
way  to  reduce  the  effects  of  delays.  Researchers  have 
attempted  to  model  motion  more  accurately36  and  switch 
between  multiple  models.37  We  can  schedule  system 
latency  to  reduce  errors38  or  minimize  them  altogether 
through  careful  system  design.39  Shifting  a  prerendered 
image  at  the  last  instant  can  effectively  compensate  for 
pan- tilt  motions.40  Through  image  warping,  such  cor¬ 
rections  can  potentially  compensate  for  delays  in  6D 
motion  (both  translation  and  rotation)  41 


tion  to  automatically  measure  and  compensate  for 
changing  calibration  parameters.19,44 

Interfaces  and  visualization 

In  the  last  five  years,  a  growing  number  of  researchers 
have  considered  how  users  will  interact  with  AR  appli¬ 
cations  and  how  to  effectively  present  information  on 
AR  displays. 


8  RV-Border 
Guards,  an  AR 
game. 


Calibration  and  autocalibration 

AR  systems  generally  require  extensive  calibration  to 
produce  accurate  registration.  Measurements  may 
include  camera  parameters,  field  of  view,  sensor  offsets, 
object  locations,  distortions,  and  so  forth.  The  AR  com¬ 
munity  uses  well-established  basic  principles  of  camera 
calibration  and  developed  many  manual  AR  calibration 
techniques.  One  way  to  avoid  a  calibration  step  is  to 
develop  calibration-free  Tenderers.  Since  Kutulakos  and 
Vallino  introduced  their  approach  of  calibration-free  AR 
based  on  a  weak  perspective  projection  model,42  Seo 
and  Hong  have  extended  it  to  cover  perspective  projec¬ 
tion,  supporting  traditional  illumination  techniques.43 
Another  example  obtains  the  camera  focal  length35 
without  an  explicit  metric  calibration  step.  The  other 
way  to  reduce  calibration  requirements  is  autocalibra¬ 
tion.  Such  algorithms  use  redundant  sensor  informa- 


User  interface  and  interaction 

Until  recently,  most  AR  prototypes  concentrated  on 
displaying  information  that  was  registered  with  the 
world  and  didn’t  significantly  concern  themselves  with 
how  potential  users  would  interact  with  these  systems. 
Prototypes  that  supported  interaction  often  based  their 
interfaces  on  desktop  metaphors  (for  example,  they  pre¬ 
sented  on-screen  menus  or  required  users  to  type  on  key¬ 
boards)  or  adapted  designs  from  virtual  environments 
research  (such  as  using  gesture  recognition  or  tracking 
6D  pointers) .  In  certain  applications,  such  techniques 
are  appropriate.  In  the  RV-Border  Guards  game,45  for 
example,  users  combat  virtual  monsters  by  using  ges¬ 
tures  to  control  their  weapons  and  shields  (Figure  8) . 

However,  it’s  difficult  to  interact  with  purely  virtual 
information.  There  are  two  main  trends  in  AR  interac¬ 
tion  research: 
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10  Heteroge¬ 
neous  displays 
in  Emmie,  com¬ 
bining  head- 
worn, 

projected,  and 
private  flat- 
screen  displays. 


9  Heteroge¬ 
neous  AR  sys¬ 
tems  using 
projected  (top) 
and  see- 
through  hand¬ 
held  (bottom) 
displays. 


Courtesy  A.  Butz,  T.  Hollerer,  S.  Feiner,  B.  MacIntyre,  C.  Beshers,  Columbia  Univ. 


■  using  heterogeneous  devices  to  leverage  the  advan¬ 
tages  of  different  displays  and 

■  integrating  with  the  physical  world  through  tangible 
interfaces. 

Different  devices  best  suit  different  interaction  tech¬ 
niques,  so  using  more  than  one  device  lets  an  appropri¬ 
ate  device  be  used  for  each  interaction  task.  For 
example,  a  handheld  tablet  interacts  well  with  a  text 
document.46  In  the  Augmented  Surfaces  system47  (Fig¬ 
ure  9),  users  manipulate  data  through  a  variety  of  real 
and  virtual  mechanisms  and  can  interact  with  data 
through  projective  and  handheld  displays.  Similarly,  the 
Emmie  system48  mixes  several  display  and  device  types 
and  lets  information  be  moved  between  devices  to 
improve  interaction  (Figure  10) .  AR  systems  can  also 
simulate  the  benefits  of  multiple  devices.  In  the  Studier- 
stube  system,  the  Personal  Interaction  Panel  (PIP)49  is  a 
tracked  blank  physical  board  the  user  holds,  upon  which 
virtual  controls  or  parts  of  the  world  are  drawn  (Figure 
11) .  The  haptic  feedback  from  the  physical  panel  gives 
similar  benefits  as  a  handheld  display  in  this  case. 

Tangible  interfaces  support  direct  interaction  with 
the  physical  world  by  emphasizing  the  use  of  real,  phys¬ 
ical  objects  and  tools.  In  one  example,  the  user  wields  a 


1 1  The  Studierstube  (top)  and  Magic  Book  (bottom) 
collaborative  AR  systems,  with  two  users  wearing  see- 
through  HMDs.  (Courtesy  Dieter  Schmalstieg,  Vienna 
University  of  Technology,  and  Mark  Billinghurst,  Human 
Interface  Technologies  Lab.) 


1 2  User  wields  a  real  paddle  to  pick  up,  move,  drop, 
and  destroy  models. 


real  paddle  to  manipulate  furniture  models  in  a  proto¬ 
type  interior  design  application.50  Through  pushing,  tilt¬ 
ing,  swatting,  and  other  motions,  the  user  can  select 
pieces  of  furniture,  drop  them  into  a  room,  push  them 
to  the  desired  locations,  and  remove  them  from  the 
room  (Figure  12) .  In  the  AR2  Hockey  system,  two  users 
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Courtesy  Hirokazu  Kato 


Courtesy  Naval  Research  Lab 


1 3  Data  filtering  to  reduce  density  problems.  Unfiltered 
view  (top)  and  filtered  view  (bottom),  from  Julier  et  al.55 


play  an  air  hockey  game  by  moving  a  real  object  that 
represents  the  user’s  paddle  in  the  virtual  world.51 

Researchers  have  explored  other  interaction  possi¬ 
bilities.  For  example,  the  Magic  Book52  depicts  live  VR 
environments  on  the  pages  of  a  book  and  lets  one  or 
more  users  enter  one  of  the  VR  environments.  When 
one  user  switches  from  AR  into  the  immersive  VR  world 
depicted  on  the  page,  the  other  AR  users  see  an  avatar 
appear  in  the  environment  on  the  book  page  (Figure 
11).  The  Magic  Book  takes  advantage  of  video  see- 
through  AR  displays,  letting  the  users’  views  of  the 
world  be  completely  blocked  out  when  they’re  in  the 
VR  environment. 

Visualization  problems 

Researchers  are  beginning  to  address  fundamental 
problems  of  displaying  information  in  AR  displays. 

Error  estimate  visualization.  In  some  AR  sys¬ 
tems,  registration  errors  are  significant  and  unavoid¬ 
able.  For  example,  the  measured  location  of  an  object 
in  the  environment  may  not  be  known  accurately 
enough  to  avoid  a  visible  registration  error.  Under  such 
conditions,  one  approach  to  rendering  an  object  is  to 
visually  display  the  area  in  screen  space  where  the  object 
could  reside,  based  on  expected  tracking  and  measure¬ 
ment  errors.53  This  guarantees  that  the  virtual  repre¬ 
sentation  always  contains  the  real  counterpart.  Another 
approach  when  rendering  virtual  objects  that  should  be 


Courtesy  INRIA 


occluded  by  real  objects  is  to  use  a  probabilistic  function 
that  gradually  fades  out  the  hidden  virtual  object  along 
the  edges  of  the  occluded  region,  making  registration 
errors  less  objectionable.54 

Data  density.  If  we  augment  the  real  world  with 
large  amounts  of  virtual  information,  the  display  may 
become  cluttered  and  unreadable.  Unlike  other  applica¬ 
tions  that  must  deal  with  large  amounts  of  information, 
AR  applications  must  also  manage  the  interaction 
between  the  physical  world  and  virtual  information,  with¬ 
out  changing  the  physical  world.  Julier  et  al.55  use  a  fil¬ 
tering  technique  based  on  a  model  of  spatial  interaction 
to  reduce  the  amount  of  information  displayed  to  a  min¬ 
imum  while  keeping  important  information  in  view  (Fig¬ 
ure  13) .  The  framework  takes  into  account  the  goal  of  the 
user,  the  relevance  of  each  object  with  respect  to  the  goal, 
and  the  position  of  the  user  to  determine  whether  each 
object  should  be  shown.  In  a  complementary  approach, 
Bell  et  al.56  model  the  environment  and  track  certain  real 
entities,  using  this  knowledge  to  ensure  that  virtual  infor¬ 
mation  isn’t  placed  on  top  of  important  parts  of  the  envi¬ 
ronment  or  other  information. 

Advanced  rendering 

For  some  applications,  virtual  augmentations  should 
be  indistinguishable  from  real  objects.  While  high-quality 
renderings  and  compositions  aren’t  currently  feasible  in 
real  time,  researchers  are  studying  the  problem  of  pho¬ 
torealistic  rendering  in  AR  and  of  removing  real  objects 
from  the  environment  (for  example,  mediated  reality). 

Mediated  reality.  The  problem  of  removing  real 
objects  goes  beyond  extracting  depth  information  from 
a  scene;  the  system  must  also  segment  individual  objects 
in  that  environment.  Lepetit  discusses  a  semiautomat¬ 
ic  method  for  identifying  objects  and  their  locations  in 
the  scene  through  silhouettes.57  In  some  situations,  this 
technique  enables  the  insertion  of  virtual  objects  and 
deletion  of  real  objects  without  an  explicit  3D  recon¬ 
struction  of  the  environment  (Figure  14) . 

Photorealistic  rendering.  A  key  requirement  for 
improving  the  rendering  quality  of  virtual  objects  in  AR 
applications  is  the  ability  to  automatically  capture  the 


14  Virtual  and 
real  occlusions. 
The  brown  cow 
and  tree  are 
virtual;  the  rest 
is  real. 
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15  Two-dimensional  shop  floor  plans  and  a  3D  pipe  model  superimposed 
on  an  industrial  pipeline. 


environmental  illumination  and  reflectance  information. 
Three  examples  of  work  in  this  area  are  an  approach  that 
uses  ellipsoidal  models  to  estimate  illumination  para¬ 
meters,58  photometric  image-based  rendering,59  and 
high  dynamic  range  illumination  capturing.60 

Human  factors  studies  and  perceptual  problems 

Experimental  results  from  human  factors,  perceptu¬ 
al  studies,  and  cognitive  science61  can  help  guide  the 
design  of  effective  AR  systems.  Drascic62  discusses  18 
different  design  issues  that  affect  AR  displays.  The  issues 
include  implementation  errors  (such  as  miscalibration), 
technological  problems  (such  as  vertical  mismatch  in 
image  frames  of  a  stereo  display),  and  fundamental  lim¬ 
itations  in  the  design  of  current  HMDs  (the  accommo- 
dation-vergence  conflict) .  Rolland  and  Fuchs  offer  a 
detailed  analysis  of  the  different  human  factors  in  con¬ 
nection  with  optical  and  see-through  HMDs  for  medical 
applications.63  Finally,  we  need  to  better  understand 
human  factors  related  to  the  effects  of  long-term  use  of 
AR  systems.  Some  significant  factors  include 

■  Latency.  Delay  causes  more  registration  errors  than 
all  other  sources  combined.  For  close  range  tasks,  a 
simple  rule  of  thumb  is  that  one  millisecond  of  delay 
causes  one  millimeter  of  error.64  More  importantly, 
delay  can  reduce  task  performance.  Delays  as  small 
as  10  milliseconds  can  make  a  statistically  significant 
difference  in  the  performance  of  a  task  to  guide  a  ring 
over  a  bent  wire.65 

■  Depth  perception.  Accurate  depth  perception  is  a  dif¬ 
ficult  registration  problem.  Stereoscopic  displays  help 
with  depth  perception,  but  current  display  technolo¬ 
gies  cause  additional  problems  (including  accom- 
modation-vergence  conflicts,  or  low  resolution  and 
dim  displays  causing  objects  to  appear  further  away 
than  they  really  are62).  Rendering  objects  with  cor¬ 


rect  occlusion  can  ameliorate  some  depth  perception 
problems.63  Consistent  registration  plays  a  crucial  role 
in  depth  perception,  even  to  the  extent  that  accurately 
determining  the  eyepoint  location  as  the  eye  rotates 
can  affect  perception.  An  analysis  of  different  eye- 
point  locations  to  use  in  rendering  an  image  con¬ 
cluded  that  the  eye’s  center  of  rotation  yields  the  best 
position  accuracy,  but  the  center  of  the  entrance  pupil 
yields  higher  angular  accuracy.66 

■  Adaptation.  User  adaptation  to  AR  equipment  can 
negatively  impact  performance.  One  study  investi¬ 
gating  the  effects  of  vertically  displacing  cameras 
above  the  user’s  eyes  in  a  video  see-through  HMD 
showed  that  subjects  could  adapt  to  the  displacement, 
but  after  removing  the  HMD,  the  subjects  exhibited 
a  large  overshoot  in  a  depth-pointing  task.14 

■  Fatigue  and  eye  strain.  Uncomfortable  AR  displays 
may  not  be  suitable  for  long-term  use.  One  study 
found  that  binocular  displays,  where  both  eyes  see 
the  same  image,  cause  significantly  more  discomfort, 
both  in  eyestrain  and  fatigue,  than  monocular  or 
stereo  displays.65 

New  applications 

We’ve  grouped  the  new  applications  into  three  areas: 
mobile,  collaborative,  and  commercial  applications. 
Before  discussing  these  further,  though,  we  would  like 
to  briefly  highlight  representative  advances  in  the  more 
traditional  areas  of  assembly,  inspection,  and  medical 
applications. 

Curtis  et  al.67  describe  the  verification  of  an  AR  system 
for  assembling  aircraft  wire  bundles.  Although  limited 
by  tracking  and  display  technologies,  their  tests  on  actu¬ 
al  assembly-line  workers  prove  that  their  AR  system  lets 
workers  create  wire  bundles  that  work  as  well  as  those 
built  by  conventional  approaches.  This  paper  also  empha¬ 
sizes  the  need  for  iterative  design  and  user  feedback. 

In  their  research,68  Navab  and  his  colleagues  take 
advantage  of  2D  factory  floor  plans  and  the  structural 
properties  of  industrial  pipelines  to  generate  3D  models 
of  the  pipelines  and  register  them  with  the  user’s  view 
of  the  factory,  obviating  the  need  for  a  general-purpose 
tracking  system  (Figure  15).  Similarly,  they  take  advan¬ 
tage  of  the  physical  constraints  of  a  C-arm  x-ray  machine 
to  automatically  calibrate  the  cameras  with  the  machine 
and  register  the  x-ray  imagery  with  the  real  objects.69 

Fuchs  and  his  colleagues  are  continuing  work  on  med¬ 
ical  AR  applications,  refining  their  tracking  and  display 
techniques  to  support  laparoscopic  surgery.70  New  med¬ 
ical  AR  applications  are  also  being  explored.  For  exam¬ 
ple,  Weghorst71  describes  how  to  use  AR  to  help  treat 
akinesia  (freezing  gait),  one  of  the  common  symptoms 
of  Parkinson’s  disease. 

Mobile  applications 

With  advances  in  tracking  and  increased  computing 
power,  researchers  are  developing  mobile  AR  systems. 
These  may  enable  a  host  of  new  applications  in  naviga¬ 
tion,  situational  awareness,  and  geolocated  information 
retrieval. 

Researchers  have  been  investigating  mobile  AR 
research  systems  operating  in  well-prepared  indoor 
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16a  view  through  a  see-through  HMD  shows  a  3D 
model  of  demolished  building  at  its  original  location. 
(Courtesy  T.  Hollerer,  S.  Feiner,  j.  Pavlik,  Columbia  Univ.) 


environments  for  some  time.  NaviCam,  for  example, 
augments  the  video  stream  collected  by  a  handheld 
video  camera.7  A  set  of  fiducials — used  to  find  the  type 
of  objects  in  view  and  to  place  the  augmentation  with¬ 
out  knowing  the  user’s  absolute  position — populate  the 
environment.  The  system  provides  simple  information, 
such  as  a  list  of  new  journals  on  a  bookshelf.  Starner  et 
al.  are  considering  the  applications  and  limitations  of 
AR  for  wearable  computers.72  Using  an  approach  simi¬ 
lar  to  NaviCam,  they  use  virtual  tags  for  registering 
graphics  and  consider  the  problems  of  finger  tracking 
(as  a  surrogate  mouse)  and  facial  recognition. 

The  “New  tracking  sensors  and  approaches”  section 
describes  strategies  for  tracking  in  various  outdoor 
environments;  here  we  focus  on  examples  of  outdoor 
applications. 

The  first  outdoor  system  was  the  Touring  Machine.46 
Developed  at  Columbia  University,  this  self-contained 
system  includes  tracking  (a  compass,  inclinometer,  and 
differential  GPS),  a  mobile  computer  with  a  3D  graphics 
board,  and  a  see-through  HMD.  The  system  presents  the 
user  with  world-stabilized  information  about  an  urban 
environment  (the  names  of  buildings  and  departments 
on  the  Columbia  campus) .  The  AR  display  is  cross-refer¬ 
enced  with  a  handheld  display,  which  provides  detailed 
information.  More  recent  versions  of  this  system  render 
models  of  buildings  that  previously  existed  on  campus, 
display  paths  that  users  need  to  take  to  reach  objectives, 
and  play  documentaries  of  historical  events  that  occurred 
at  the  observed  locations73  (Figure  16).  The  Naval 
Research  Lab  (NRL)  developed  a  similar  system — the  Bat¬ 
tlefield  Augmented  Reality  System  (Figure  17) —  to  help 
during  military  operations  in  urban  environments.55  The 
system  goal  is  to  augment  the  environment  with  dynam¬ 
ic  3D  information  (such  as  goals  or  hazards)  usually  con¬ 
veyed  on  2D  maps.  Recently,  the  system  also  provides 
tools  to  author  the  environment  with  new  3D  informa¬ 
tion  that  other  system  users  see  in  turn.74 

In  the  same  area,  Piekarski75  is  developing  user  inter¬ 
action  paradigms  and  techniques  for  interactive  model 
construction  in  a  mobile  AR  environment.  This  system 
also  lets  an  outdoor  user  see  objects  (such  as  an  aircraft) 
that  only  exist  in  a  virtual  military  simulator  (Figure  18) . 


ARQuake,76  designed  using  the  same  platform,  blends 
users  in  the  real  world  with  those  in  a  purely  virtual  envi¬ 
ronment.  A  mobile  AR  user  plays  as  a  combatant  in  the 
computer  game  Quake,  where  the  game  runs  with  a  vir¬ 
tual  model  of  the  real  environment.  When  GPS  is  unavail¬ 
able,  the  system  switches  to  visual  tracking  derived  from 
the  ARToolkit.  The  recently  started  Archeoguide  project 
is  developing  a  wearable  AR  system  for  providing  tourists 


18  Two  views 
of  a  combined 
augmented  and 
virtual  environ¬ 
ment.  (Courtesy 
Wayne  Piekars¬ 
ki,  Bernard 
Gunther,  and 
Bruce  Thomas, 
University  of 
South 
Australia.) 
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19  AR  in  sports 
broadcasting. 
The  annotations 
on  the  race  cars 
and  the  yellow 
first  down  line 
are  inserted 
into  the  broad¬ 
cast  in  real  time. 
(Courtesy  of 
NASCAR  and 
Sportvision,  top 
and  bottom, 
respectively.) 


with  information  about  a  historic  site  in  Olympia, 
Greece.77  Rekimoto  discusses  creating  content  for  wear¬ 
able  AR  systems.78 

Mobile  AR  systems  must  be  worn,  which  challenges 
system  designers  to  minimize  weight  and  bulk.  With 
current  technology,  one  approach  is  to  move  some  of 
the  computation  load  to  remote  servers,  reducing  the 
equipment  the  user  must  wear.79,80 

Collaborative  applications 

Many  AR  applications  can  benefit  from  having  multi¬ 
ple  people  simultaneously  view,  discuss,  and  interact  with 
the  virtual  3D  models.  As  Billinghurst  and  Kato  discuss,81 
AR  addresses  two  major  issues  with  collaboration: 


Surfaces47),  users  are  unencum¬ 
bered,  can  see  each  other’s  eyes,  and 
will  see  the  same  augmentations. 
However,  this  approach  is  limited  to 
adding  virtual  information  to  the 
projected  surfaces. 

Tracked,  see-through  displays  can 
alleviate  this  limitation  by  letting  3D 
graphics  be  placed  anywhere  in  the 
environment.  Examples  of  collabo¬ 
rative  AR  systems  using  see-through 
displays  include  both  those  that  use 
see-through  handheld  displays 
(such  as  Transvision82)  and  see- 
through  head-worn  displays  (such 
as  Emmie,48  Magic  Book,52  and 
Studierstube83) .  An  example  of  mul¬ 
tiple-system  collaboration  is  the 
integration  of  mobile  warfighters 
(engaged  with  virtual  enemies  via 
AR  displays)  collaborating  with 
units  in  a  VR  military  simulation.75,84 

A  significant  problem  with  collo¬ 
cated,  collaborative  AR  systems  is 
ensuring  that  the  users  can  establish 
a  shared  understanding  of  the  virtu¬ 
al  space,  analogous  to  their  under¬ 
standing  of  the  physical  space. 
Because  the  graphics  are  overlaid 
independently  on  each  user’s  view  of 
the  world,  it’s  difficult  to  ensure  that 
each  user  clearly  understands  what 
other  users  are  pointing  or  referring 
to.  In  Studierstube,  the  designers 
attempt  to  overcome  this  problem  by 
rendering  virtual  representations  of 
the  physical  pointers,  which  are  vis¬ 
ible  to  all  participants  (Figure  11). 

Numerous  system  designers  have 
suggested  the  benefits  of  adaptive 
interfaces  tailored  to  each  user’s  interests  and  skills.  The 
ability  to  personalize  the  information  presented  to  each 
user  also  lets  AR  systems  present  private  information  to 
individuals  without  fearing  that  others  will  see  it.  In  the 
Emmie  system,  Butz  and  his  colleagues  discuss  the 
notion  of  privacy  management  in  collaborative  AR  sys¬ 
tems  and  present  an  approach  to  managing  the  visibili¬ 
ty  of  information  using  the  familiar  metaphors  of  lamps 
and  mirrors  48 

Another  form  of  collaborative  AR  is  in  entertainment 
applications.  Researchers  have  demonstrated  a  number 
of  AR  games,  including  AR  air  hockey,51  collaborative 
combat  against  virtual  enemies,45  and  an  AR-enhanced 
pool  game.85 


■  seamless  integration  with  existing  tools  and  practices 
and 

■  enhancing  practice  by  supporting  remote  and  collo¬ 
cated  activities  that  would  otherwise  be  impossible. 

By  using  projectors  to  augment  the  surfaces  in  a  col¬ 
laborative  environment  (such  as  Rekimoto’s  Augmented 


Commercial  applications 

Recently,  AR  has  been  used  for  real-time  augmenta¬ 
tion  of  broadcast  video,  primarily  to  enhance  sporting 
events  and  to  insert  or  replace  advertisements  in  a  scene. 
An  early  example  is  the  FoxTrax  system,  which  high¬ 
lights  the  location  of  a  hard-to-see  hockey  puck  as  it 
moves  rapidly  across  the  ice.86 
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Figure  19  shows  two  current  examples  of  AR  in  sports. 
In  both  systems,  the  environments  are  carefully  modeled 
ahead  of  time,  and  the  cameras  are  calibrated  and  pre¬ 
cisely  tracked.  For  some  applications,  augmentations  are 
added  solely  through  real-time  video  tracking.  Delaying 
the  video  broadcast  by  a  few  video  frames  eliminates  the 
registration  problems  caused  by  system  latency.  Fur¬ 
thermore,  the  predictable  environment  (uniformed  play¬ 
ers  on  a  green,  white,  and  brown  field)  lets  the  system 
use  custom  chroma-keying  techniques  to  draw  the  yel¬ 
low  line  only  on  the  field  rather  than  over  the  players. 

With  similar  approaches,  advertisers  can  embellish 
broadcast  video  with  virtual  ads  and  product  place¬ 
ments  (Figure  20) . 


20  Virtual 
advertising.  The 
Pacific  Bell  and 
Pennsylvania 
Lottery  ads  are 
AR  augmenta¬ 
tions. 


Future  work 

Apart  from  the  few  commercial  examples  described  in 
the  last  section,  the  state  of  the  art  in  AR  today  is  com¬ 
parable  to  the  early  years  of  VR — many  research  sys¬ 
tems  have  been  demonstrated  but  few  have  matured 
beyond  lab-based  prototypes.87  We’ve  grouped  the 
major  obstacles  limiting  the  wider  use  of  AR  into  three 
themes:  technological  limitations,  user  interface  limi¬ 
tations,  and  social  acceptance  issues. 

Technological  limitations 

Although  we’ve  seen  much  progress  in  the  basic 
enabling  technologies,  they  still  primarily  prevent  the 
deployment  of  many  AR  applications.  Displays,  track¬ 
ers,  and  AR  systems  in  general  need  to  become  more 
accurate,  lighter,  cheaper,  and  less  power  consuming. 

By  describing  problems  from  our  common  experi¬ 
ences  in  building  outdoor  AR  systems,  we  hope  to 
impart  a  sense  of  the  many  areas  that  still  need 
improvement.  Displays  such  as  the  Sony  Glasstron  are 
intended  for  indoor  consumer  use  and  aren’t  ideal  for 
outdoor  use.  The  display  isn’t  very  bright  and  com¬ 
pletely  washes  out  in  bright  sunlight.  The  image  has  a 
fixed  focus  to  appear  several  feet  away  from  the  user, 
which  is  often  closer  than  the  outdoor  landmarks.  The 
equipment  isn’t  nearly  as  portable  as  desired.  Since  the 
user  must  wear  the  PC,  sensors,  display,  batteries,  and 
everything  else  required,  the  end  result  is  a  cumber¬ 
some  and  heavy  backpack. 

Laptops  today  have  only  one  CPU,  limiting  the 
amount  of  visual  and  hybrid  tracking  that  we  can  do. 
Operating  systems  aimed  at  the  consumer  market  aren’t 
built  to  support  real-time  computing,  but  specialized 
real-time  operating  systems  don’t  have  the  drivers  to 
support  the  sensors  and  graphics  in  modern  hardware. 

Tracking  in  unprepared  environments  remains  an 
enormous  challenge.  Outdoor  demonstrations  today 
have  shown  good  tracking  only  with  significant  restric¬ 
tions  in  operating  range,  often  with  sensor  suites  that 
are  too  bulky  and  expensive  for  practical  use.  Today’s 
systems  generally  require  extensive  calibration  proce¬ 
dures  that  an  end  user  would  find  unacceptably  com¬ 
plicated.  Many  connectors  such  as  universal  serial  bus 
(USB)  connectors  aren’t  rugged  enough  for  outdoor 
operation  and  are  prone  to  breaking. 

While  we  expect  some  improvements  to  naturally 
occur  from  other  fields  such  as  wearable  computing, 


research  in  AR  can  reduce  these  difficulties  through 
improved  tracking  in  unprepared  environments  and  cal¬ 
ibration-free  or  autocalibration  approaches  to  minimize 
set-up  requirements. 

User  interface  limitations 

We  need  a  better  understanding  of  how  to  display  data 
to  a  user  and  how  the  user  should  interact  with  the  data. 
Most  existing  research  concentrates  on  low-level  per¬ 
ceptual  issues,  such  as  properly  perceiving  depth  or  how 
latency  affects  manipulation  tasks.  However,  AR  also 
introduces  many  high-level  tasks,  such  as  the  need  to 
identify  what  information  should  be  provided,  what’s  the 
appropriate  representation  for  that  data,  and  how  the 
user  should  make  queries  and  reports.  For  example,  a  user 
might  want  to  walk  down  a  street,  look  in  a  shop  window, 
and  query  the  inventory  of  that  shop.  To  date,  few  have 
studied  such  issues.  However,  we  expect  significant 
growth  in  this  area  because  research  AR  systems  with  suf¬ 
ficient  capabilities  are  now  more  commonly  available. 
For  example,  recent  work  suggests  that  the  creation  and 
presentation  of  narrative  performances  and  structures 
may  lead  to  more  realistic  and  richer  AR  experiences.88 

Social  acceptance 

The  final  challenge  is  social  acceptance.  Given  a  sys¬ 
tem  with  ideal  hardware  and  an  intuitive  interface,  how 
can  AR  become  an  accepted  part  of  a  user’s  everyday 
life,  just  like  a  mobile  phone  or  a  personal  digital  assis¬ 
tant  (PDA)?  Through  films  and  television,  many  people 
are  familiar  with  images  of  simulated  AR.  However,  per- 
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suading  a  user  to  wear  a  system  means  addressing  a 
number  of  issues.  These  range  from  fashion  (will  users 
wear  a  system  if  they  feel  it  detracts  from  their  appear¬ 
ance?)  to  privacy  concerns  (we  can  also  use  the  tracking 
required  for  displaying  information  for  monitoring  and 
recording) .  To  date,  little  attention  has  been  placed  on 
these  fundamental  issues.  However,  these  must  be 
addressed  before  AR  becomes  widely  accepted.  ■ 
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